JAVA-5949 preserve connection pool on backpressure errors when establishing connections by nhachicha · Pull Request #1900 · mongodb/mongo-java-driver

nhachicha · 2026-02-25T13:45:19Z

Original PR #1854 accidentally closed, and had no outstanding review comments.

This current PR depends on #1856:

we should merge the latter into main,
then merge main in backpressure,
then merge backpressure into the the branch for this PR,
and only then can we merge this PR into backpressure.

Specification changes:

DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers (#1855)
DRIVERS-3218: update SDAM error handling tests to ignore handshake failures (#1860)
DRIVERS-3218 Clarify interaction between TLS errors and backpressure (#1879) - the change has not yet been addressed in this PR, because it was introduced after creation of the PR.
DRIVERS-3218 clarify label handling (#1892) - the change has not yet been addressed in this PR, because it was introduced after creation of the PR.
DRIVERS-3367: Fix racy backpressure-network-* tests (#1880) - just a unified spec tests (JSON) update; the change has not yet been addressed in this PR, because it was introduced after creation of the PR.
DRIVERS-3394: adjust spec test description (#1894) - just a unified spec tests (JSON) update; the change has not yet been addressed in this PR, because it was introduced after creation of the PR.

JAVA-5949, JAVA-6056, JAVA-6095

…establishing connections

- Fixing static check analysis

codeowners-service-app · 2026-02-25T16:59:25Z

Assigned stIncMale for team dbx-java because vbabanin is out of office.

stIncMale · 2026-03-04T22:39:28Z

Added the following to the description of the PR (a new piece of work to be done within the PR): DRIVERS-3394: adjust spec test description (#1894)(JAVA-6095).

vbabanin · 2026-03-10T02:39:54Z

driver-sync/src/test/functional/com/mongodb/client/ServerDiscoveryAndMonitoringProseTests.java

+            boolean terminated = executor.awaitTermination(20, SECONDS);
+            assertTrue("Executor did not terminate within timeout", terminated);
+
+            // Assert at least 10 ConnectionCheckOutFailedEvents occurred


Do we need this comment? The assertion message already explains the intent (e.g., “Expected at least 10 ConnectionCheckOutFailedEvents, but got …”).

vbabanin · 2026-03-10T02:40:49Z

driver-sync/src/test/functional/com/mongodb/client/ServerDiscoveryAndMonitoringProseTests.java

+            assertTrue("Expected at least 10 ConnectionCheckOutFailedEvents, but got " + connectionCheckOutFailedEventCount.get(),
+                    connectionCheckOutFailedEventCount.get() >= 10);
+
+            // Assert 0 PoolClearedEvents occurred


The same as in: https://github.com/mongodb/mongo-java-driver/pull/1900/changes#r2908976631

vbabanin · 2026-03-10T02:43:05Z

driver-sync/src/test/functional/com/mongodb/client/ServerDiscoveryAndMonitoringProseTests.java

+            // Teardown: sleep 1 second and reset rate limiter
+            Thread.sleep(1000);
+            adminDatabase.runCommand(new Document("setParameter", 1)
+                    .append("ingressConnectionEstablishmentRateLimiterEnabled", false));


This cleanup is currently conditional on the code above completing successfully. If an assertion or exception happens earlier, this teardown won’t run, which can leak state into subsequent tests.

We should move it this cleanup into @AfterEach/afterEach so it runs reliably regardless of how the test exits.

vbabanin · 2026-03-10T03:22:21Z

driver-sync/src/test/functional/com/mongodb/client/ServerDiscoveryAndMonitoringProseTests.java

+        AtomicInteger connectionCheckOutFailedEventCount = new AtomicInteger(0);
+        AtomicInteger poolClearedEventCount = new AtomicInteger(0);
+
+        ConnectionPoolListener connectionPoolListener = new ConnectionPoolListener() {
+            @Override
+            public void connectionCheckOutFailed(final ConnectionCheckOutFailedEvent event) {
+                connectionCheckOutFailedEventCount.incrementAndGet();
+            }
+
+            @Override
+            public void connectionPoolCleared(final ConnectionPoolClearedEvent event) {
+                poolClearedEventCount.incrementAndGet();
+            }
+        };


Instead of introducing a new anonymous listener with counters, we can reuse the existing test listener:

TestConnectionPoolListener connectionPoolListener = new TestConnectionPoolListener();

It already provides await helpers that double as assertions and helpers to assert that zero PoolClearedEvents happened, e.g.:

connectionPoolListener.waitForEvent(ConnectionPoolClearedEvent.class, 1, 110, SECONDS);

This keeps the test more concise and reuses established utilities for clarity/consistency.

vbabanin · 2026-03-10T04:24:21Z

driver-core/src/main/com/mongodb/internal/connection/DefaultSdamServerDescriptionManager.java

+            // clear the pool as they're not related to overload.
+            // TLS configuration errors (certificate validation, protocol mismatches) should also clear the pool
+            // as they indicate configuration issues, not server overload.
+            if (beforeHandshake && !sdamIssue.relatedToAuth() && !sdamIssue.relatedToTlsConfigurationError()) {


Currently we attach SystemOverloadedError and RetryableError labels in the SDAM error-handling path (effectively only for DefaultServer). In load-balanced mode, SDAM isn’t involved: the LB code path invalidates the pool directly (e.g., connectionPool.invalidate(serviceId, generation)), so the labeling logic is bypassed.

This means users running the driver in LB mode (behind an NLB) can still hit network errors, TLS handshake failures, timeouts during connection establishment or hello, but won’t get the labels.

However, these labels are a CMAP requirement, not SDAM. The CMAP spec states: “The pool MUST add the error labels SystemOverloadedError and RetryableError to network errors or network timeouts it encounters during the connection establishment or the hello message.”

Since this is defined as a pool behavior (topology-agnostic), it seems we should implement the labeling in the connection pool layer so it applies consistently in both default and load-balanced modes.

vbabanin · 2026-03-10T04:49:55Z

driver-core/src/main/com/mongodb/internal/connection/DefaultSdamServerDescriptionManager.java

+            // clear the pool as they're not related to overload.
+            // TLS configuration errors (certificate validation, protocol mismatches) should also clear the pool
+            // as they indicate configuration issues, not server overload.
+            if (beforeHandshake && !sdamIssue.relatedToAuth() && !sdamIssue.relatedToTlsConfigurationError()) {


Currently we don’t distinguish DNS lookup failures (UnknownHostException) from other connection-establishment network errors. As a result, a DNS failure goes through the generic path (same as connection reset/timeout) and gets SystemOverloadedError/RetryableError labels.

The CMAP spec excludes DNS failures from backpressure labeling: `“For errors that the driver can distinguish as never occurring due to server overload, such as DNS lookup failures […] the driver MUST NOT add backpressure error labels for these error types.”.

Proposed change: detect DNS failure by walking the exception cause chain for UnknownHostException (it’s wrapped as MongoSocketException from ServerAddressHelper.getSocketAddresses()), and when present, skipping backpressure label attachment so SDAM follows the normal path (clear the pool and mark the server Unknown).

In that case, we should add coverage to assert that labeling and pool clearing behaviour. If the driver ever changes the wrapper exception type MongoSocketException (or stops wrapping UnknownHostException this way) and starts adding labels the test should fail.

nhachicha added 9 commits February 25, 2026 13:37

Fixes JAVA-5949 prevent connection churn on backpressure errors when …

c1d40fa

…establishing connections

Remove handshake and update submodule including new tests

a0a7dbf

Update spec test; fix test runner

fc3603e

Add prose test

e78c931

Increasing the timeout termination

4d836b1

- Revert spec timeout

61c53a8

- Fixing static check analysis

Update exception checks

37c930b

Increase timeout for operations to complete

4f5ae84

Simplifying conditions check

0be0250

nhachicha self-assigned this Feb 25, 2026

nhachicha requested a review from vbabanin February 25, 2026 16:56

nhachicha marked this pull request as ready for review February 25, 2026 16:56

nhachicha requested a review from a team as a code owner February 25, 2026 16:56

codeowners-service-app bot requested a review from stIncMale February 25, 2026 16:59

stIncMale mentioned this pull request Feb 25, 2026

[JAVA-6033] ServerHeartbeatSucceededEvent is not fired for initial POLL monitoring #1856

Open

katcharov removed the request for review from stIncMale March 3, 2026 16:34

vbabanin requested changes Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAVA-5949 preserve connection pool on backpressure errors when establishing connections#1900

JAVA-5949 preserve connection pool on backpressure errors when establishing connections#1900
nhachicha wants to merge 9 commits intomongodb:backpressurefrom
nhachicha:nh/backpressure/preserve_connection_pool

nhachicha commented Feb 25, 2026 •

edited by stIncMale

Loading

Uh oh!

codeowners-service-app bot commented Feb 25, 2026

Uh oh!

stIncMale commented Mar 4, 2026

Uh oh!

vbabanin Mar 10, 2026

Uh oh!

vbabanin Mar 10, 2026

Uh oh!

vbabanin Mar 10, 2026

Uh oh!

vbabanin Mar 10, 2026

Uh oh!

vbabanin Mar 10, 2026

Uh oh!

vbabanin Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nhachicha commented Feb 25, 2026 • edited by stIncMale Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codeowners-service-app bot commented Feb 25, 2026

Uh oh!

stIncMale commented Mar 4, 2026

Uh oh!

vbabanin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

vbabanin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

vbabanin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

vbabanin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

vbabanin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

vbabanin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nhachicha commented Feb 25, 2026 •

edited by stIncMale

Loading